Draft
Conversation
e774818 to
a1f47be
Compare
ehuss
reviewed
Mar 4, 2026
src/frontmatter.md
Outdated
| r[frontmatter.body] | ||
| No line in the body may start with a sequence of hyphens (`-`) equal to or longer than the opening fence. The body may not contain carriage returns. | ||
|
|
||
| [horizontal whitespace]: whitespace.md#grammar-HORIZONTAL_WHITESPACE |
Contributor
There was a problem hiding this comment.
If you want, I think this can use the automatic link:
Suggested change
| [horizontal whitespace]: whitespace.md#grammar-HORIZONTAL_WHITESPACE | |
| [horizontal whitespace]: grammar-HORIZONTAL_WHITESPACE |
Contributor
Author
There was a problem hiding this comment.
PR to make that work:
Currently, in our Markdown, we support `[text][RULE_NAME]` and `[text][grammar-RULE_NAME]` for linking to grammar rules, but we don't support this syntax within link reference definitions, i.e., `[text]: grammar-RULE_NAME`, even though we do support linking to (non-grammar) rule identifiers within link reference definitions. That's an inconsistency that continually surprises us. Let's fix that. In this commit, we add `grammar_link_references`, which scans link reference definitions for destinations that match a grammar rule name -- either with a `grammar-` prefix or not. When a match is found, the destination is replaced with the resolved path and anchor, just as `rule_link_references` does for rules. Unrecognized destinations pass through unchanged, falling through to `std_links` for rustdoc resolution -- the same behavior as unresolved `[text][NAME]` reference links. We also update the dev-guide to document the new feature in both `links.md` and `grammar.md`.
c36c071 to
ec0193c
Compare
The prior commit added a grammar for frontmatter, but the grammar notation available at the time that commit was prepared couldn't express all of the invariants the language requires. Opening and closing fences must have the same dash count. Indented fences must be rejected as an error. And once an opening fence is recognized, the parser must commit -- it can't backtrack and reinterpret the dashes as tokens. Since then, we've added named range repeats, hard cut, and negative lookahead to the grammar notation. With these, we can express the invariants directly. In this commit, we rewrite the frontmatter grammar. Named range repeats let the closing fence reference the opening fence's dash count. Hard cut commits the parse after the opening dashes. And `FRONTMATTER_INVALID` uses hard cut followed by the bottom rule (`^ ⊥`) to express that indented fences are a recognized-and-rejected syntactic form. We also add `⊥` as a primitive production in the Notation chapter, move `HORIZONTAL_WHITESPACE` to Whitespace, and fix some minor editorial matters such as indentation and comment style.
The fence description uses the phrase "a matching pair of hyphens",
which can be misread as describing exactly two individual hyphens.
The constraints on fence length and matching are also compressed into
a single sentence with a trailing subclause ("from 3 to 255") that
reads as nonrestrictive.
Let's give each constraint its own sentence: what a fence is, where
it must appear, the length bounds on the opening fence, the matching
requirement for the closing fence, and trailing whitespace. This
makes the structure clearer.
The infostring sentence uses an inverted construction ("Following the
opening fence may be an infostring"); it's a bit awkward.
Let's use active voice and tighten the phrasing.
The body restriction sentence combines two unrelated constraints -- the hyphen-line restriction and the carriage-return ban -- in a single sentence joined by "or". This makes "or carriage returns" read as parallel to "hyphens", as though the line could maybe start with carriage returns. Let's split these into two separate sentences so that each constraint stands on its own.
The prose mentions "horizontal whitespace" in two places (fence trailing content and infostring trailing content) without linking to the grammar definition. Since `HORIZONTAL_WHITESPACE` is now a defined production in Whitespace, let's add a link so readers can click through to the precise definition.
The frontmatter removal section in `input-format.md` is a
single sentence ("After some whitespace, frontmatter may next
appear in the input") that doesn't clearly describe the removal
behavior. By contrast, the shebang removal section provides a full
description with an example.
Let's rewrite the section with a precise description of the removal
process and add an annotated example.
The `frontmatter.document` rule said "Frontmatter may only be preceded by a shebang and whitespace", where the "and" could be misread as requiring both a shebang and whitespace rather than listing the set of things allowed to precede frontmatter. Since we merged the shebang prose revision (#2192), the shebang position rule now reads as a positive statement of where the shebang may appear. Let's follow the same pattern here: state positively where frontmatter may appear rather than leaning on "only" and a negative constraint. We'll also rename the rule identifier to `frontmatter.position` in keeping with our conventions.
The example under `frontmatter.intro` used an external crate, a nontrivial script body, and a bare `rust` code block that would fail CI since the test runner doesn't support frontmatter. Let's simplify it to mirror the example in the frontmatter removal section of `input-format.md`, and let's wrap it in an `EXAMPLE` admonition consistent with our convention for examples that aren't demonstrating the behavior of a specific rule.
The intro under `frontmatter.intro` said "an optional section for content intended for external tools without requiring these tools to have full knowledge of the Rust grammar." This was a negative construction (what frontmatter doesn't require) rather than a positive one (what it is and what it enables). In this commit, we rewrite the intro as "an optional section of metadata whose syntax allows external tools to read it without parsing Rust." This tells the reader three things in one sentence: what frontmatter is, who it's for, and the key design property.
For the `WHITESPACE` grammar rule, we cite `Pattern_White_Space`. For `HORIZONTAL_WHITESPACE`, we hadn't cited provenance. Let's do that. Horizontal whitespace, in a Unicode context, is defined by UAX 31, Section 4.1, which categorizes `Pattern_White_Space` into line endings, ignorable format controls, and horizontal space. The horizontal space category is exactly the two characters our grammar specifies.
ec0193c to
00c5777
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
cc @ehuss
(Note the draft status. I put this up so @ehuss and I could discuss this in the lang-docs call, but it's not yet ready for general review -- I'm still revising.)